智能论文笔记

Text classification in shipping industry using unsupervised models and Transformer based supervised models

Ying Xie , Dongping Song

分类：自然语言处理 | 机器学习

2022-12-21

Obtaining labelled data in a particular context could be expensive and time consuming. Although different algorithms, including unsupervised learning, semi-supervised learning, self-learning have been adopted, the performance of text classification varies with context. Given the lack of labelled dataset, we proposed a novel and simple unsupervised text classification model to classify cargo content in international shipping industry using the Standard International Trade Classification (SITC) codes. Our method stems from representing words using pretrained Glove Word Embeddings and finding the most likely label using Cosine Similarity. To compare unsupervised text classification model with supervised classification, we also applied several Transformer models to classify cargo content. Due to lack of training data, the SITC numerical codes and the corresponding textual descriptions were used as training data. A small number of manually labelled cargo content data was used to evaluate the classification performances of the unsupervised classification and the Transformer based supervised classification. The comparison reveals that unsupervised classification significantly outperforms Transformer based supervised classification even after increasing the size of the training dataset by 30%. Lacking training data is a key bottleneck that prohibits deep learning models (such as Transformers) from successful practical applications. Unsupervised classification can provide an alternative efficient and effective method to classify text when there is scarce training data.

translated by 谷歌翻译

Mitigating Artifacts in Real-World Video Super-Resolution Models

Liangbin Xie , Xintao Wang , Shuwei Shi , Jinjin Gu , Chao Dong , Ying Shan

分类：计算机视觉

2022-12-14

The recurrent structure is a prevalent framework for the task of video super-resolution, which models the temporal dependency between frames via hidden states. When applied to real-world scenarios with unknown and complex degradations, hidden states tend to contain unpleasant artifacts and propagate them to restored frames. In this circumstance, our analyses show that such artifacts can be largely alleviated when the hidden state is replaced with a cleaner counterpart. Based on the observations, we propose a Hidden State Attention (HSA) module to mitigate artifacts in real-world video super-resolution. Specifically, we first adopt various cheap filters to produce a hidden state pool. For example, Gaussian blur filters are for smoothing artifacts while sharpening filters are for enhancing details. To aggregate a new hidden state that contains fewer artifacts from the hidden state pool, we devise a Selective Cross Attention (SCA) module, in which the attention between input features and each hidden state is calculated. Equipped with HSA, our proposed method, namely FastRealVSR, is able to achieve 2x speedup while obtaining better performance than Real-BasicVSR. Codes will be available at https://github.com/TencentARC/FastRealVSR

translated by 谷歌翻译

Collaborating Heterogeneous Natural Language Processing Tasks via Federated Learning

Chenhe Dong , Yuexiang Xie , Bolin Ding , Ying Shen , Yaliang Li

分类：自然语言处理

2022-12-12

The increasing privacy concerns on personal private text data promote the development of federated learning (FL) in recent years. However, the existing studies on applying FL in NLP are not suitable to coordinate participants with heterogeneous or private learning objectives. In this study, we further broaden the application scope of FL in NLP by proposing an Assign-Then-Contrast (denoted as ATC) framework, which enables clients with heterogeneous NLP tasks to construct an FL course and learn useful knowledge from each other. Specifically, the clients are suggested to first perform local training with the unified tasks assigned by the server rather than using their own learning objectives, which is called the Assign training stage. After that, in the Contrast training stage, clients train with different local learning objectives and exchange knowledge with other clients who contribute consistent and useful model updates. We conduct extensive experiments on six widely-used datasets covering both Natural Language Understanding (NLU) and Natural Language Generation (NLG) tasks, and the proposed ATC framework achieves significant improvements compared with various baseline methods. The source code is available at \url{https://github.com/alibaba/FederatedScope/tree/master/federatedscope/nlp/hetero_tasks}.

translated by 谷歌翻译

Latent Diffusion Energy-Based Model for Interpretable Text Modeling

Peiyu Yu , Sirui Xie , Xiaojian Ma , Baoxiong Jia , Bo Pang , Ruiqi Gao , Yixin Zhu , Song-Chun Zhu , Ying Nian Wu

分类：机器学习 | 自然语言处理

2022-06-13

潜在空间基于能量的模型（EBM），也称为基于能量的先验，引起了对生成建模的日益兴趣。由于其在潜在空间的配方和强大的建模能力方面的灵活性所推动，最近构建的作品已经进行了有趣的尝试，目的是针对文本建模的解释性。但是，潜在空间EBM还继承了数据空间中EBM的一些缺陷。实践中退化的MCMC抽样质量会导致培训中的发电质量和不稳定差，尤其是在具有复杂潜在结构的数据上。受到最近的努力的启发，该努力利用扩散恢复的可能性学习是解决抽样问题的一种方法，我们在变异学习框架中引入了扩散模型和潜在空间EBM之间的新型共生，这是潜在扩散能量基于能量的模型。我们与信息瓶颈共同开发基于几何聚类的正则化，以进一步提高学到的潜在空间的质量。对几个具有挑战性的任务进行的实验证明了我们模型在可解释的文本建模上的优越性能而不是强大的同行。

translated by 谷歌翻译

VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder

Yuchao Gu , Xintao Wang , Liangbin Xie , Chao Dong , Gen Li , Ying Shan , Ming-Ming Cheng

分类：计算机视觉

2022-05-13

尽管最近的生成面部先验和几何事物最近证明了盲面修复的高质量结果，但忠实于投入的细粒度细节仍然是一个具有挑战性的问题。由基于经典词典的方法和最近的矢量量化（VQ）技术激励，我们提出了一种基于VQ的面部恢复方法-VQFR。 VQFR利用从高质量面孔中提取的高质量低级特征银行，因此可以帮助恢复现实的面部细节。但是，通过忠实的细节和身份保存，VQ代码簿的简单应用无法取得良好的结果。因此，我们进一步介绍了两个特殊的网络设计。 1）。我们首先研究了VQ代码簿中的压缩补丁大小，并发现使用适当的压缩补丁大小设计的VQ代码簿对于平衡质量和忠诚度至关重要。 2）。为了进一步融合来自输入的低级功能，而不是“污染” VQ代码簿中生成的现实细节，我们提出了一个由纹理解码器和主要解码器组成的并行解码器。然后，这两个解码器与具有变形卷积的纹理翘曲模块进行交互。拟议的VQFR配备了VQ Codebook作为面部细节词典和平行解码器设计，可以在很大程度上提高面部细节的恢复质量，同时保持对先前方法的保真度。

translated by 谷歌翻译

Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning

Chi Zhang , Sirui Xie , Baoxiong Jia , Ying Nian Wu , Song-Chun Zhu , Yixin Zhu

分类：人工智能 | 计算机视觉 | 机器学习

2021-11-25

智力是通过连接主义或典型主义者实现的吗？虽然连接主义方法取得了超人的性能，但已经越来越多的证据表明，这些特定的特定优势在系统泛化中特别脆弱。这种观察表明了连接主义和典型主义者之间的中央辩论，其中后者不断地倡导认知架构中的代数治疗。在这项工作中，我们遵循典型主义者的呼叫，并提出一种混合方法来提高推理系统的泛化。具体而言，我们展示了具有代数表示的原型，用于乌鸦的渐进矩阵（RPM）的抽象空间 - 时间推理任务，并呈现代数感知神经半符号（Alans）学习者。艾拉斯学习者受到抽象代数和代表理论的动机。它由神经视觉感知前端和代数抽象推理后端组成：前端总结了基于对象的表示的可视信息，而后端将其转换为代数结构，并在飞行中引导隐藏的操作员。稍后执行诱导的操作员以预测答案的表示，并且选择与预测最相似的选择作为解决方案。广泛的实验表明，通过纳入代数处理，艾拉斯学习者优于需要系统泛化的域中的各种纯粹连接主义模型。我们进一步表明学习的代数表示可以通过同构以产生答案来解码。

translated by 谷歌翻译

Unsupervised Foreground Extraction via Deep Region Competition

Peiyu Yu , Sirui Xie , Xiaojian Ma , Yixin Zhu , Ying Nian Wu , Song-Chun Zhu

分类：计算机视觉 | 机器学习

2021-10-29

我们呈现深度区域竞争（DRC），这是一种旨在以完全无监督的方式从图像中提取前景对象的算法。前景提取可以被视为一种特殊的泛型图像分段的情况，专注于从背景中识别和解开对象。在这项工作中，我们通过以专家（MOE）的混合形式的生成图像建模和生成图像建模来重新思考前景提取，我们进一步介绍了学习的像素重新分配作为捕获规律的基本诱导偏差背景区域。通过这种建模，可以通过期望最大化（EM）自然地发现前景背景分区。我们表明，该方法有效利用了在分区过程中混合成分之间的相互作用，该分区过程紧密地连接到区域竞争，是通用图像分割的一个精细方法。实验表明，与现有方法相比，DRC在复杂的真实数据上表现出更具竞争力的性能和具有挑战性的多对象场景。此外，我们认为，即使在训练期间看不见的类别，DRC也可能概括为新的前景物体。

translated by 谷歌翻译

On Path Integration of Grid Cells: Group Representation and Isotropic Scaling

Ruiqi Gao , Jianwen Xie , Xue-Xin Wei , Song-Chun Zhu , Ying Nian Wu

分类：机器学习 | (统计)机器学习

2020-06-18

了解网格单元如何执行路径集成计算仍然是一个根本的问题。在本文中，我们对网格单元进行了对路径集成的一般表示模型的理论分析，其中2D自身位被编码为更高的尺寸向量，并且通过向量的一般转换表示2D自动。我们确定转型的两个条件。一个是路径集成所必需的组表示条件。另一个是一种各向同性的缩放条件，可确保局部共形地嵌入，使得向量表示中的误差符合在2D自身位置中的误差。然后，我们调查最简单的转换，即线性变换，将其显式代数和几何结构揭示为矩阵旋转，并探索各向同性缩放条件与特殊类六角网格图案之间的连接。最后，通过基于优化的方法，我们可以学习六边形网格模式，该网格图案在啮齿动物大脑中共享网格细胞的相似性质。学习模型能够准确地长距离路径集成。代码可在https://github.com/ruiqigao/grid-cell-path中获得。

translated by 谷歌翻译

Boosting Neural Networks to Decompile Optimized Binaries

Ying Cao , Ruigang Liang , Kai Chen , Peiwei Hu

分类：机器学习

2023-01-03

Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.

translated by 谷歌翻译

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

Sanghyun Woo , Shoubhik Debnath , Ronghang Hu , Xinlei Chen , Zhuang Liu , In So Kweon , Saining Xie

分类：计算机视觉

2023-01-02

Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can also potentially benefit from self-supervised learning techniques such as masked autoencoders (MAE). However, we found that simply combining these two approaches leads to subpar performance. In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition. This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation. We also provide pre-trained ConvNeXt V2 models of various sizes, ranging from an efficient 3.7M-parameter Atto model with 76.7% top-1 accuracy on ImageNet, to a 650M Huge model that achieves a state-of-the-art 88.9% accuracy using only public training data.

translated by 谷歌翻译